Retrieval of Relevant Parts of Document Images Based on 2D Density Distributions of Characters

نویسندگان

  • Koichi Kise
  • Masaaki Tsujino
چکیده

This paper presents a new method of document image retrieval that is capable of spotting parts of document images relevant to users’ queries. This enables us to improve effectiveness and usability of retrieval, since users are relieved from burdens of finding relevant parts in retrieved documents. The proposed method is based on the assumption that parts of document images which densely contain characters in queries are relevant to them. For the purpose of ranking relevant parts, two-dimensional density distributions of characters are calculated based on layout features such as locations of characters and distance to the nearest characters. Based on the experimental results of retrieving Japanese newspaper articles, it is shown that the proposed method is superior to a method without a function of retrieving the parts.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Document Image Retrieval Based on 2D Density Distributions of Terms with Pseudo Relevance Feedback

Document image retrieval is a task to retrieve document images relevant to a user’s query. Most of existing methods based on word-level indexing rely on the representation called “bag of words” which originated in the field of information retrieval. This paper presents a new representation of documents that utilizes additional information about the location of words in pages so as to improve th...

متن کامل

Document Image Retrieval Based on Keyword Spotting Using Relevance Feedback

Keyword Spotting is a well-known method in document image retrieval. In this method, Search in document images is based on query word image. In this Paper, an approach for document image retrieval based on keyword spotting has been proposed. In proposed method, a framework using relevance feedback is presented. Relevance feedback, an interactive and efficient method is used in this paper to imp...

متن کامل

Improved Skips for Faster Postings List Intersection

Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...

متن کامل

Improved Skips for Faster Postings List Intersection

Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...

متن کامل

Document Analysis And Classification Based On Passing Window

In this paper we present Document analysis and classification system to segment and classify contents of Arabic document images. This system includes preprocessing, document segmentation, feature extraction and document classification. A document image is enhanced in the preprocessing by removing noise, binarization, and detecting and correcting image skew. In document segmentation, an algorith...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004